Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
نویسندگان
چکیده
Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through "crowd-sourcing." Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for "next-generation," high-coverage lexical terminologies.
منابع مشابه
Quantifying the Impact and Extent of Undocumented Biomedical Synonymy Supporting Information
Consistent with previous observations [1, 2, 3], we noticed that many of the terms contained within the UMLS Metathesaurus were inappropriate for natural language-oriented analyses (ex: database-specific encodings, machine permutations, non-English language entries, etc.). Therefore, prior to generating the terminologies utilized in this study, we subjected the Metathesaurus to a thorough, rule...
متن کاملQuantifying the Impact of Eliminating Oil Revenue on Oil Exporters' Macroeconomy
The sudden drop in crude oil prices during the Coronavirus pandemic, once again rises the concern about oil countries future if oil diminishes. This paper uses a multi-country general equilibrium model to project the effects of eliminating oil revenues. The model consists of 5 oil exporters (i.e., Iran, Kuwait, Saudi Arabia, Kazakhstan and Russia), 25 non-oil exporting countries, and the rest o...
متن کاملEvaluating Seismic Effects on a Water Supply Network and Quantifying Post-Earthquake Recovery
This paper summarises the impact of major earthquakes, 2010–2011, on Christchurch’s water supply network and what recovery measures have been applied, what worked well, what did not and why. A number of issues related to the open nature of the Christchurch water supply network were identified during earthquakes. It was difficult to manage large water supply pressure zones during the post-earthq...
متن کاملIdentity of the previously unrecognized Chetogena flaviceps and its synonymy with C. scutellaris (Diptera: Tachinidae)
نمونهی تایپ ناشناختهی Chetogena flaviceps (Bigot) مورد مطالعه و همنامی آن با گونهیChetogena scutellaris Wulp مورد تایید قرار گرفت. دستگاه جنسی نمونهی تایپ ترسیم گردیده و توصیف مجدد گونهیC. scutellaris و صفات تشخیص حشره نر از سایر گونهها آورده شده است.
متن کاملThe Impact of Undocumented Immigration on ID Theft in the United States: An Empirical Study
According to the U.S. Federal Trade Commission, identity theft constituted the number one consumer complaint in the United States in 2006. Using state-level data for the 50 states for 2005, we find, among other things, that the rate of reported identity theft per 100,000 population is directly related to the unemployment rate, the percent of the population residing in urban areas, and the exten...
متن کامل